Collecting High Quality

نویسندگان

  • Hui Yang
  • Anton Mityagin
  • Sergey Markov
  • Krysta M. Svore
چکیده

This paper studies quality of human labels used to train search engines’ rankers. Our specific focus is performance improvements obtained by using overlapping relevance labels, which collecting multiple human judgments for each training sample. The paper explores whether, when, and for which should obtain overlapping training labels, as well as labels per sample are needed. The proposed scheme collects additional labels only for a subset of training samples, specifically for those that are labeled relevant by a Our experiments show that this labeling schem NDCG of two Web search rankers on several real with a low labeling overhead of around 1.4 labels per sample This labeling scheme also outperforms several overlapping labels, such as simple k-overlap, majority vote, the highest labels, etc. Finally, the paper presents a study of how many overlapping labels are needed to get the best in retrieval accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Milk Collecting Centers in Socio-Economic Situation of Dairy Animal Keepers

Most of the poor people in developing countries live in rural areas. Experience shows that in manycountries animal husbandry development and increasing investment specially is a way to challengethe poverty in rural areas. This is obvious that, small producers have key role in dairy markets anddairy development but they have no power to bargain and negotiate for higher prices, access to themarke...

متن کامل

Addressing the Resource Bottleneck to Create Large-Scale Annotated Texts

Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-ofspeech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpo...

متن کامل

Supply chain management of laboratory commodities for tuberculosis in Indonesia: using assessment results to strengthen staff capacity

Background High quality laboratory diagnosis is critical for any tuberculosis (TB) control program. Reliable and accurate laboratory testing depends on collecting high quality specimens, using careful collection methods, and properly storing and transporting specimens. Although various guidelines for proper collection and handling exist in Indonesia, there was no data on health worker complianc...

متن کامل

Evaluating the Impact of a New Interactive Digital Solution for Collecting Care Quality In-formation for Residenial Homes

Collecting and analysing timely and accurate information about the quality of care that is delivered to older people in residential homes is a challenge. Most current approaches to collecting this information are manual, and add time, cost and human error to them. Interactive digital technologies have the potential to reduce the time consumed, cost and errors in these processes, which in turn c...

متن کامل

Acquiring High Quality Non-Expert Knowledge from On-Demand Workforce

Being expensive and time consuming, human knowledge acquisition has consistently been a major bottleneck for solving real problems. In this paper, we present a practical framework for acquiring high quality non-expert knowledge from on-demand workforce using Amazon Mechanical Turk (MTurk). We show how to apply this framework to collect large-scale human knowledge on AOL query classification in ...

متن کامل

Array-Based Whole-Genome Survey of Dog Saliva DNA Yields High Quality SNP Data

BACKGROUND Genome-wide association scans for genetic loci underlying both Mendelian and complex traits are increasingly common in canine genetics research. However, the demand for high-quality DNA for use on such platforms creates challenges for traditional blood sample ascertainment. Though the use of saliva as a means of collecting DNA is common in human studies, alternate means of DNA collec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010